[1] 1
[1] "Well done!"
[1] 2
[1] "Well done!"
[1] 3
[1] "Well done!"
Advanced / optional materials
2023-01-01
for loopsWe can define power as the probability to reject the null hypothesis (\(H_0\)) when the hypothesized alternative hypothesis (\(H_{1}\)) is true.
It usually depends on:
If we do not have the power, our results are inconclusive and estimates, in case we reject \(H_0\), inflated.
For power calculations we can:
In a SEM framework there are many ways to calculate power via simulation. We will only implement it manually in R. Here a list of alternatives:
pwrSEM shiny app (Wang et al., 2021): https://doi.org/10.1177/2515245920918253simsem package (lavaan objects): https://simsem.org/ (see also Beaujean, 2014, chapter 8)semPower packagepower4SEM shiny app (Jak et al., 2021): https://doi.org/10.3758/s13428-020-01479-0ss.aipe.rmsea(); ss.aipe.sem.path(); ss.power.sem())Or you can apply rules of thumb that recommend either absolute minimum sample sizes (e.g., \(N = 100\) or \(200\); Boomsma, 1982, 1985) or sample sizes based on model complexity (e.g., \(n = 5\)–\(10\) per estimated parameter; Bentler & Chou, 1987; \(n = 3\)–\(6\) per variable; Cattell, 1978) BUT, NO, PLEASE!
for loopsA fundamental skill that you should possess is doing for (or while) cycles in R. This is crucial for estimating power via simulation, but also for many other ordinary analyses.

With a for loop you can iterate one or more action along a sequence of values. This values usually go from 1 to N, but can also be character or unordered sequences, or sequences of increasing but random numbers.
library(ggplot2)
library(gganimate)
p <- ggplot(d, aes(x = X, y = M)) +
geom_point(size = 5, color = "red") +
geom_line() +
theme_bw()
panim <- p +
gganimate::transition_reveal(X) +
gganimate::shadow_wake(wake_length = 0.1, alpha = FALSE)
gganimate::anim_save("../assets/images/power/forimg.gif", panim)
for loops for power analysisIf we want to run a power analysis via simulation using a for loop, in general we need the following structure and elements:
# A set of basic info
iter = 100 # number of desired iterations (i)
N = 400 # the sample size we test
population = 1e6
# A population with hypothesized effects
x = rnorm(population); z = rnorm(population) # predictors
y = .30*x + .20*z + rnorm(population) # hypothetical model with ES
d = data.frame(x, y, z)
# A set of objects to store the results
p <- c()
# A model
m <- "y ~ x + z"
# The loop with minimum three parts
for (i in 1:iter) {
# Data simulation/sampling
dfor <- d[sample(1:nrow(d), N), ]
# Test
fit <- sem(m, dfor)
# Storage
p[i] <- parameterEstimates(fit)[1, "pvalue"]
}
mean(p < .05) # calculate powerYou can do the same thing with any model (lm, glm, m-clust, meta-analysis…).
Does adaptability predict academic achievement?
H: adaptability directly influence achievement with an effect = .20.
library(lavaan)
library(MASS)
# Generate and rename two correlated variables (r = .20)
CovMat <- lav_matrix_lower2full(c(
1,
.20, 1
))
colnames(CovMat) <- rownames(CovMat) <- c("Ad", "Gp")
# The for cycle
N = 200 # Is this N sufficient to have power?
iter = 500 # How many iterations we want?
p = c() # store p-values
b = c() # store betas
for (i in 1:iter) {
# Simulate the data
db <- as.data.frame(
mvrnorm(CovMat, n = N, mu = c(0, 0),
empirical = FALSE) # EMPIRICAL IS FALSE HERE!
)
# Fit the model
m <- lm(Gp ~ Ad, data = db)
# Storage
p[i] <- summary(m)$coefficients["Ad", "Pr(>|t|)"]
b[i] <- summary(m)$coefficients["Ad", "Estimate"]
}
mean(p < .05) # calculate power
mean(b > .15) # ...There are too many things that are important for achievement to ignore them!
# A more complete matrix Ad Em Se Sr Gp
CovMat <- lav_matrix_lower2full(c(
1,
.30, 1,
.40, .30, 1,
.40, .30, .35, 1,
.20, .15, .45, .30, 1
))
colnames(CovMat) <- rownames(CovMat) <- c("Ad", "Em", "Se", "Sr", "Gp")
# For cycle
iter = 1000
N = 5000
p = as.data.frame(matrix(nrow = iter, ncol = 5))
colnames(p) <- c("intercept", "ad", "em", "se", "sr")
for (i in 1:iter) {
db <- data.frame(mvrnorm(n = N, mu = c(0, 0, 0, 0, 0),
CovMat, empirical = FALSE))
m <- lm(Gp ~ Ad + Em + Se + Sr, data = db)
p[i, ] <- summary(m)$coefficients[, "Pr(>|t|)"]
}
colMeans(p < .05) # power for each effect
mean(rowSums(p[, 2:5] < .05) == 4) # power for all effects togetherHowever, we know that all these variables interact and we also have a hypothetical pattern of associations that suggests that adaptability has no direct link to achievement. Adaptability is associated with achievement only through the mediation of emotions, self-efficacy, and self-regulation.
We want to test whether the indirect effects are significant. Nothing else!
iter = 1000
N = 500
# Store only the three indirect effects
ps = as.data.frame(matrix(nrow = iter, ncol = 3))
colnames(ps) <- c("em", "se", "sr")
for (i in 1:iter) {
db <- data.frame(mvrnorm(n = N, mu = c(0, 0, 0, 0, 0),
CovMat, empirical = FALSE))
fit <- sem(path, data = db)
ps[i, ] <- parameterEstimates(fit)[12:14, "pvalue"]
}
colMeans(ps < .05) # specific powers
colMeans(ps < .01)
mean(rowSums(ps < .05) == 3) # total power: all 3 indirect effects
mean(rowSums(ps[, c("se", "sr")] < .05) == 2)
mean(rowSums(ps[, c("se", "sr")] < .01) == 2)QUESTIONS? COMMENTS?
We simulated the covariance matrix directly from a correlation matrix.
An easy way to do all these steps is to use the simulateData() function in lavaan. This make it easier to simulate the data:
QUESTIONS? COMMENTS?
while loopIf you don’t want to manually change N each time, you can nest multiple for loops one into the other, or use a while loop.
iter = 1000 # How many interactions
p = c() # Save only what we are interested in
power_at_n = c(0) # Here we calculate the power every time a cycle ends
N = 5000 # The sample size of the first cycle
k = 2 # This is only needed to make the while loop work
sample = c() # Just to keep track of the 'N' used
while (power_at_n[k-1] < 0.80) { # Until power_at_n reaches 0.80, we continue
for (i in 1:iter) {
db <- data.frame(mvrnorm(n = N, mu = c(0, 0, 0, 0, 0),
CovMat, empirical = FALSE))
m <- lm(Gp ~ Ad + Em + Se + Sr, data = db)
p[i] <- summary(m)$coefficients[2, 4]
}
power_at_n[k] <- mean(p < .05) # Calculate the power
sample[k-1] <- N # Save the used N
N = N + 500 # Increase the sample size
k = k + 1 # Move on to the next cycle
}The same concepts of power for statistical significance can be applied to every analysis and statistics. It might be interesting for you to compare alternative models (e.g., hierarchical vs oblique model).
How to do it is pretty straightforward.
Let’s say we have a four-factor structure with three items per factor. We will keep loadings always equal for simplicity.
# PREPARE THE MODEL FOR SIMULATION
set.seed(12)
N <- 450; l <- .65
# Step 1. Simulate the latent variables (hierarchical model)
hfactor <- rnorm(N)
# Step 2. Simulate the four specific factors
s1 <- l*hfactor + rnorm(N); s2 <- l*hfactor + rnorm(N)
s3 <- l*hfactor + rnorm(N); s4 <- l*hfactor + rnorm(N)
# Step 3. Simulate the items of each specific factor
d <- matrix(nrow = N, ncol = 12)
colnames(d) <- paste0(
rep(c("s1", "s2", "s3", "s4"), each = 3),
rep(c(".1", ".2", ".3"), 4)
)
for (i in 1:3) {
d[, i] <- l*s1 + rnorm(N)
d[, i+3] <- l*s2 + rnorm(N)
d[, i+6] <- l*s3 + rnorm(N)
d[, i+9] <- l*s4 + rnorm(N)
}
ct <- cor(d) # correlation matrixWe can simulate all the process each time, or use the correlation matrix as input for the simulation. You can do it either way. For space reason, let’s use the correlation matrix ct.
# Hierarchical model
mH <- "
s1 =~ s1.1 + s1.2 + s1.3
s2 =~ s2.1 + s2.2 + s2.3
s3 =~ s3.1 + s3.2 + s3.3
s4 =~ s4.1 + s4.2 + s4.3
lv =~ s1 + s2 + s3 + s4
"
# Oblique model
mO <- "
s1 =~ s1.1 + s1.2 + s1.3
s2 =~ s2.1 + s2.2 + s2.3
s3 =~ s3.1 + s3.2 + s3.3
s4 =~ s4.1 + s4.2 + s4.3
s1 ~~ s2 + s3 + s4
s2 ~~ s3 + s4
s3 ~~ s4
"
iter <- 1000
N <- 200
fi <- c("cfi", "rmsea", "bic", "aic")library(MASS)
library(lavaan)
fit <- matrix(ncol = 4, nrow = iter)
compare <- matrix(ncol = 4, nrow = iter)
for (i in 1:iter) {
tryCatch({
db <- data.frame(mvrnorm(n = N, mu = rep(0, 12),
ct, empirical = FALSE))
fH <- sem(mH, data = db)
fO <- sem(mO, data = db)
fitH <- fitmeasures(fH, fi)
fitO <- fitmeasures(fO, fi)
fit[i, ] <- c(
ifelse(fitH["cfi"] > .95, 1, 0),
ifelse(fitH["rmsea"] < .08, 1, 0),
ifelse(fitO["cfi"] > .95, 1, 0),
ifelse(fitO["rmsea"] < .08, 1, 0)
)
compare[i, ] <- c(
ifelse(fitH["cfi"] > fitO["cfi"], 1, 0),
ifelse(fitH["rmsea"] < fitO["rmsea"], 1, 0),
ifelse(fitH["bic"] < fitO["bic"], 1, 0),
ifelse(fitH["aic"] < fitO["aic"], 1, 0)
)
}, error = function(e) {
fit[i, ] <- NA
compare[i, ] <- NA
})
}QUESTIONS? COMMENTS?
lavaanlavaan is a listwise deletion"pairwise""ml" or "fiml"estimator="MLR", you will also estimate robust standard errorsYou can also set standard errors with the se argument:
"robust" or "robust.mlm" or "robust.mlr""bootstrap""none"options(digits = 2)
# THESE ARE THE SAME DATA OF THE SEM SLIDES
d <- readxl::read_excel("../data/Dmissing.xlsx")
colSums(apply(d, 2, is.na))
sum(complete.cases(d))
sum(complete.cases(d)) / nrow(d)
m <- "
# CFA model
peerPressure =~ PP1 + PP2 + PP3 + PP4
socialMedia =~ SM1 + SM2 + SM3 + SM4
socialComparison =~ SC1 + SC2 + SC3
eatingDisorder =~ ED1 + ED2
# Structural model
eatingDisorder ~ socialComparison
socialComparison ~ peerPressure + socialMedia
"p00 <- parameterEstimates(fit, standardized = TRUE)[1:16, 1:3]
pt1 <- parameterEstimates(fit, standardized = TRUE)[1:16, 11]
p0 <- rep("||", 16)
pt2 <- parameterEstimates(fitna, standardized = TRUE)[1:16, 11]
pt <- cbind(p00, p0, pt1, p0, pt2)
pt[, c(5, 7)] <- round(pt[, c(5, 7)], 3)
names(pt)[c(4, 6)] <- "||"
knitr::kable(pt)If you are interested in using Bayesian analyses, this can also be applied to the SEM framework.
This is not the place where we will learn Bayesian analysis, but let’s see how to code it using blavaan.
blavaan# install.packages("blavaan")
library(blavaan)
model <- '
# latent variable definitions
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual covariances
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
bfit <- bsem(model, data = PoliticalDemocracy)But of course you need priors.
dpriors(bfit)
nu = "normal(0,32)" # MV intercept
alpha = "normal(0,10)" # LV intercept
lambda = "normal(0,10)" # loading
beta = "normal(0,10)" # regression
theta = "gamma(1,.5)[sd]" # MV precision
psi = "gamma(1,.5)[sd]" # LV precision
rho = "beta(1,1)" # correlation
ibpsi = "wishart(3,iden)" # covariance matrix
tau = "normal(0,1.5)" # threshold
# AND YOU CAN CHANGE THEM
mydp <- dpriors(lambda = "normal(1,2)")The priors we just saw are used to set the same prior to all the parameters of a kind (e.g., all loadings).
We can also (we should probably), set priors for individual parameters. This is done within the model specification.
The story about fit indices does not finish where we said. There is much more and much more is still going on.
Groskurth et al. (2023): https://doi.org/10.3758/s13428-023-02193-3
Fit indices do not only depend on model fit/misfit, but also on intrinsic characteristics of the models/analysis.
Dynamic shiny app: https://www.dynamicfit.app/connect/
McNeish & Wolf (2023): https://doi.org/10.1037/met0000425
lavaan basics: https://users.ugent.be/~yrosseel/lavaan/lavaan2.pdfblavaan introduction: https://ecmerkle.github.io/blavaan/index.html